fix:test_professionalism - AttributeError: 'tuple' object has no attr…#2470
Open
Angelenx wants to merge 2 commits intoconfident-ai:mainfrom
Conversation
|
Someone is attempting to deploy a commit to the Confident AI Team on Vercel. A member of the Team first needs to authorize it. |
Contributor
|
PR author is not in the allowed authors list. |
Author
|
The current version, when using Openrouter's LLM API, may encounter errors similar to the following: 🙌 Congratulations! You're now using OpenRouter `openai/gpt-5-mini` for all evals that require an LLM.
🎯 Evaluating test case #0 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 0% 0:00:11
FRunning teardown with pytest sessionfinish...
============================================================================================ FAILURES ============================================================================================
______________________________________________________________________________________ test_professionalism ______________________________________________________________________________________
def test_professionalism():
dotenv.load_dotenv(dotenv_path=".env.local")
model = OpenRouterModel(
model=os.getenv("OPENROUTER_MODEL_NAME", "openai/gpt-5-mini"),
api_key=os.getenv("OPENROUTER_API_KEY", ""),
)
professionalism_metric = ConversationalGEval(
name="Professionalism",
criteria="Determine whether the assistant has acted professionally based on the content.",
threshold=0.5,
model=model
)
test_case = ConversationalTestCase(
turns=[
Turn(role="user", content="What is DeepEval?"),
Turn(role="assistant", content="DeepEval is an open-source LLM eval package.")
]
)
> assert_test(test_case, [professionalism_metric])
tests/test_basic.py:28:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
/home/angelen/miniconda3/envs/deepeval/lib/python3.12/site-packages/deepeval/evaluate/evaluate.py:135: in assert_test
test_result = loop.run_until_complete(
/home/angelen/miniconda3/envs/deepeval/lib/python3.12/asyncio/base_events.py:691: in run_until_complete
return future.result()
^^^^^^^^^^^^^^^
/home/angelen/miniconda3/envs/deepeval/lib/python3.12/site-packages/deepeval/evaluate/execute.py:678: in a_execute_test_cases
await asyncio.wait_for(
/home/angelen/miniconda3/envs/deepeval/lib/python3.12/asyncio/tasks.py:520: in wait_for
return await fut
^^^^^^^^^
/home/angelen/miniconda3/envs/deepeval/lib/python3.12/site-packages/deepeval/evaluate/execute.py:581: in execute_with_semaphore
return await _await_with_outer_deadline(
/home/angelen/miniconda3/envs/deepeval/lib/python3.12/site-packages/deepeval/evaluate/execute.py:300: in _await_with_outer_deadline
return await asyncio.wait_for(coro, timeout=timeout)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
/home/angelen/miniconda3/envs/deepeval/lib/python3.12/asyncio/tasks.py:520: in wait_for
return await fut
^^^^^^^^^
/home/angelen/miniconda3/envs/deepeval/lib/python3.12/site-packages/deepeval/evaluate/execute.py:923: in _a_execute_conversational_test_cases
await measure_metrics_with_indicator(
/home/angelen/miniconda3/envs/deepeval/lib/python3.12/site-packages/deepeval/metrics/indicator.py:235: in measure_metrics_with_indicator
await asyncio.gather(*tasks)
/home/angelen/miniconda3/envs/deepeval/lib/python3.12/site-packages/deepeval/metrics/indicator.py:248: in safe_a_measure
await metric.a_measure(
/home/angelen/miniconda3/envs/deepeval/lib/python3.12/site-packages/deepeval/metrics/conversational_g_eval/conversational_g_eval.py:176: in a_measure
await self._a_generate_evaluation_steps()
/home/angelen/miniconda3/envs/deepeval/lib/python3.12/site-packages/deepeval/metrics/conversational_g_eval/conversational_g_eval.py:213: in _a_generate_evaluation_steps
return await a_generate_with_schema_and_extract(
/home/angelen/miniconda3/envs/deepeval/lib/python3.12/site-packages/deepeval/metrics/utils.py:464: in a_generate_with_schema_and_extract
data = trimAndLoadJson(result, metric)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
input_string = (Steps(steps=['For each turn, compare the Role field to the Content field: verify the content aligns with the declared... then label the assistant as professional / borderline / unprofessional with brief supporting examples.']), 0.00116275)
metric = <deepeval.metrics.conversational_g_eval.conversational_g_eval.ConversationalGEval object at 0x79a233c24200>
def trimAndLoadJson(
input_string: str,
metric: Optional[BaseMetric] = None,
) -> Any:
> start = input_string.find("{")
^^^^^^^^^^^^^^^^^
E AttributeError: 'tuple' object has no attribute 'find'
/home/angelen/miniconda3/envs/deepeval/lib/python3.12/site-packages/deepeval/metrics/utils.py:389: AttributeError
====================================================================================== slowest 10 durations ======================================================================================
11.68s call tests/test_basic.py::test_professionalism
(2 durations < 0.005s hidden. Use -vv to show these durations.)
==================================================================================== short test summary info =====================================================================================
FAILED tests/test_basic.py::test_professionalism - AttributeError: 'tuple' object has no attribute 'find'
1 failed, 4 warnings in 11.84s
All metrics errored for all test cases, please try again.This pull request fixes the issue. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This pull request introduces an improvement to the
a_generate_with_schema_and_extractutility function indeepeval/metrics/utils.py. The update adds support for handling models that return a(result, cost)tuple, ensuring that cost tracking is properly accrued and the result is correctly extracted for downstream processing.Metric cost handling and result extraction:
a_generate_with_schema_and_extractto handle models that return a(result, cost)tuple, accrue the cost using the metric's_accrue_costmethod if available, and extract the actual result for further processing.This pull request fixes an issue where the new LLM return format could not be parsed, resulting in the error "test_professionalism - AttributeError: 'tuple' object has no attribute 'find'".